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Abstract 

Adaptive networks rely on in-network and collaborative processing among distributed agents to 
deliver enhanced performance in estimation and inference tasks. Information is exchanged among the 
nodes, usually over noisy links. The combination weights that are used by the nodes to fuse information 
from their neighbors play a critical role in influencing the adaptation and tracking abilities of the network. 
This paper first investigates the mean-square performance of general adaptive diffusion algorithms in the 
presence of various sources of imperfect information exchanges, quantization errors, and model non- 
stationarities. Among other results, the analysis reveals that link noise over the regression data modifies 
the dynamics of the network evolution in a distinct way, and leads to biased estimates in steady-state. 
The analysis also reveals how the network mean-square performance is dependent on the combination 
weights. We use these observations to show how the combination weights can be optimized and adapted. 
Simulation results illustrate the theoretical findings and match well with theory. 
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I. Introduction 

AN adaptive network consists of a collection of agents that are interconnected to each other and 
solve distributed estimation and inference problems in a collaborative manner. Two useful strategies 
that enable adaptation and learning over such networks in real-time are the incremental strategy El-fTl 
and the diffusion strategy [ 8 ]— [ 12 ]. Incremental strategies rely on the use of a Hamiltonian cycle, i.e., a 
cyclic path that covers all nodes in the network, which is generally difficult to enforce since determining 
a Hamiltonian cycle is an NP-hard problem. In addition, cyclic trajectories are not robust to node or link 
failure. In comparison, diffusion strategies are scalable, robust, and able to match well the performance of 
incremental networks. In adaptive diffusion implementations, information is processed locally at the nodes 
and then diffused in real-time across the network. Diffusion strategies were originally proposed in [|8]]- 
ifTOll and further extended and studied in lfITI - |[T7ll . They have been applied to model self-organized and 
complex behavior encountered in biological networks, such as fish schooling [18], bird flight formations 
|[T9l , and bee swarming |[20l . Diffusion strategies have also been applied to online learning of Gaussian 
mixture models (2T|, |[22l and to general distributed optimization problems l23ll . There have also been 
several useful works in the literature on distributed consensus-type strategies, with application to multi- 
agent formations and distributed processing Il24l - ||3T| . The main difference between these works and the 
diffusion approach of O, ifTTI . |[T2l is the latter 's emphasis on the role of adaptation and learning over 
networks. 

In the original diffusion least-mean-squares (LMS) strategy (9], CH, the weight estimates that are 
exchanged among the nodes can be subject to quantization errors and additive noise over the commu- 
nication links. Studying the degradation in mean-square performance that results from these particular 
perturbations can be pursued, for both incremental and diffusion strategies, by extending the mean-square 
analysis already presented in J9), IfTTI . in the same manner that the tracking analysis of conventional stand- 
alone adaptive filters was obtained from the counterpart results in the stationary case (as explained in j32l 
Ch. 21]). Useful results along these lines, which study the effect of link noise during the exchange of the 
weight estimates, already appear for the traditional diffusion algorithm in the works |[33l - |[36l and for 
consensus-based algorithms in 1371 , 1381 . In this paper, our objective is to go beyond these earlier studies 
by taking into account additional effects, and by considering a more general algorithmic structure. The 
reason for this level of generality is because the analytical results will help reveal which noise sources 
influence the network performance more seriously, in what manner, and at what stage of the adaptation 
process. The results will suggest important remedies and mechanisms to adapt the combination weights 
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in real-time. Some of these insights are hard to get if one focuses solely on noise during the exchange 
of the weight estimates. The analysis will further show that noise during the exchange of the regression 
data plays a more critical role than other sources of imperfection: this particular noise alters the learning 
dynamics and modes of the network, and biases the weight estimates. Noises related to the exchange of 
other pieces of information do not alter the dynamics of the network but contribute to the deterioration 
of the network performance. 

To arrive at these results, in this paper, we first consider a generalized analysis that applies to a 
broad class of diffusion adaptation strategies (see ©-{Til further ahead; this class includes the original 
diffusion strategies ([3]) and (H} as two special cases). The analysis allows us to account for various 
sources of information noise over the communication links. We allow for noisy exchanges during each 
of the three processing steps of the adaptive diffusion algorithm (the two combination steps © and 
(0 and the adaptation step ©). In this way, we are able to examine how the three sets of combination 
coefficients {a\jk,cik,a,2,ik \ m ©-© influence the propagation of the noise signals through the network 
dynamics. Our results further reveal how the network mean-square-error performance is dependent on 
these combination weights. Following this line of reasoning, the analysis leads to algorithms (11241 ) 
and (11281) further ahead for choosing the combination coefficients to improve the steady-state network 
performance. 

It should be noted that several combination rules, such as the Metropolis rule |39| and the maximum 
degree rule BUI , were proposed previously in the literature — especially in the context of consensus-based 
iterations B01 - ll42l . These schemes, however, usually suffer performance degradation in the presence of 
noisy information exchange since they ignore the network noise profile 1151 . When the noise variance 
differs across the nodes, it becomes necessary to design combination rules that are aware of this variation 
as outlined further ahead in Section VI-B. Moreover, in a mobile network ifTBl where nodes are on the 
move and where neighborhoods evolve over time, it is even more critical to employ adaptive combination 
strategies that are able to track the variations in the noise profile in order to cope with such dynamic 
environments. This issue is taken up in Section VI-C. 

A. Notation 

We use lowercase letters to denote vectors, uppercase letters for matrices, plain letters for deterministic 
variables, and boldface letters for random variables. We also use (•)* to denote conjugate transposition, 
Tr(-) for the trace of its matrix argument, p(-) for the spectral radius of its matrix argument, for 
the Kronecker product, and vec(-) for a vector formed by stacking the columns of its matrix argument. 
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We further use diag{- • • } to denote a (block) diagonal matrix formed from its arguments, and col{- • • } 
to denote a column vector formed by stacking its arguments on top of each other. All vectors in our 
treatment are column vectors, with the exception of the regression vectors, u^ i, and the associated noise 

(u) 

signals, v lk ^ which are taken to be row vectors for convenience of presentation. 

II. Diffusion Algorithms with Imperfect Information Exchange 

We consider a connected network consisting of N nodes. Each node k collects scalar measurements 
dj.(i) and 1 x M regression data vectors Uk,i over successive time instants i > 0. Note that we use 
parenthesis to refer to the time-dependence of scalar variables, as in d&(z), and subscripts to refer to the 
time-dependence of vector variables, as in u^i. The measurements across all nodes are assumed to be 
related to an unknown M x 1 vector w° via a linear regression model of the form 11321 : 

d k (i) = u k ,iW° + v k (i) (1) 

where v k {i) denotes the measurement or model noise with zero mean and variance a 2 v k . The vector w° 
in ([]]) denotes the parameter of interest, such as the parameters of some underlying physical phenomenon, 
the taps of a communication channel, or the location of food sources or predators. Such data models are 
also useful in studies on hybrid combinations of adaptive filters Il43"l - ll47l . 

The nodes in the network would like to estimate w° by solving the following minimization problem: 

N 

minimize } Eldfc(i) — u k iw\ 2 (2) 
w *■ — ' 

k=l 

In previous works O, ifTTTl . |[T3l , we introduced and studied several distributed strategies of the diffusion 
type that allow nodes to cooperate with each other in order to solve problems of the form <f2]) in an 
adaptive manner. These diffusion strategies endow networks with adaptation and learning abilities, and 
enable information to diffuse through the network in real-time. We review the adaptive diffusion strategies 
below. 

A. Diffusion Adaptation with Perfect Information Exchange 

In 0, IfTTTl . two classes of diffusion algorithms were proposed. One class is the so-called Combine- 
then- Adapt (CTA) strategy: 



<f>k,i-l — ^ a l,lk w l,i-l 



l&Nk 0) 
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and the second class is the so-called Adapt-then-Combine (ATC) strategy: 

/£A4 (4) 
Wk,i = X 0>2,lkl/>l,i 

where the {/J-k} are small positive step-size parameters and the {aijk, qjt, 0,2,1k} we nonnegative entries 
of the N x N matrices {A±, C, A 2 }, respectively. The coefficients {«i ik, Qfe, 0,2 ik} are zero whenever 
node I is not connected to node k, i.e., I ^ N k , where A4 denotes the neighborhood of node k. The two 
strategies © and (0]) can be integrated into one broad class of diffusion adaptation ifTTI : 

<f>k,i-i = X a i,lk w l,i-i (5) 
leM k 

tpk,i = 4>k,i-l + fJ'k X C lk u *lM 1 ^ ~ U l,i4>k,i-l] (6) 

™jfc,i = X a 2) iki>i,i (V) 

Several diffusion strategies can be obtained as special cases of ©-© through proper selection of the 
coefficients {ai^, Qfc, a 2j /fc}. For example, to recover the CTA strategy ©, we set A 2 = In, and to 
recover the ATC strategy ©, we set A\ = In, where In denotes the N x N identity matrix. In the 
general diffusion strategy ©-([7]), each node k evaluates its estimate w^^. at time i by relying solely on 
the data collected from its neighbors through steps ([5]) and © and on its local measurements through 
step ©. The matrices A%, A 2 , and C are required to be left or right-stochastic, i.e., 

Ajt N = t N , Ajt N = t N , Ct N = t N (8) 

where 1 n denotes the N x 1 vector whose entries are all one. This means that each node performs a 
convex combination of the estimates received from its neighbors at every iteration i. 

The mean-square performance and convergence properties of the diffusion algorithm ©-([7]) have 
already been studied in detail in (9], lfTTTl . For the benefit of the analysis in the subsequent sections, 
we present below in (|2TT) the recursion describing the evolution of the weight error vectors across the 
network. To do so, we introduce the error vectors: 

<Pk,i-i =w° - (f>k,i-i (9) 
i>k,i = w° - ipk ti (10) 

W k ;=W°-W k ; (11) 
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and substitute the linear model CQ) into the adaptation step ((6]) to find that 

M>k,i = (hi - fJ-kRk,i)4>k,i-i - Hk ^2 ci k si }i (12) 
where the M x M matrix R^j and the Mxl vector j are defined as: 

Rk,i - ^2 c lk u *,i u l,i (13) 

S M - u l,i v k{i) (14) 

We further collect the various quantities across all nodes in the network into the following block vectors 
and matrices: 



Hi = diag{.Ri ( i, . 


■ ■ , Rn,i} 


(15) 


Si = col{si )is . . . 


j s N,i} 


(16) 


M = di&g{fiiI M , 


. . . , unIm} 


(17) 


(pi = col . . 


, 4>N,i} 


(18) 


V>i = colj^i,,, . . 


,1pN,i} 


(19) 


Wi = col{w^i, . . . 


, WN,i} 


(20) 



Then, from ©, (O, and ((121 . the recursion for the network error vector is given by 

(21) 



w, = Aj(I NM - MH^Ajwi-! - A]MC T Si 



where 

Ai=A!®I M , C = C®I M , A 2 = A 2 ®I M (22) 

B. Noisy Information Exchange 

Each of the steps in ©-(0 involves the sharing of information between node k and its neighbors. 
For example, in the first step ([5]), all neighbors of node k send their estimates to node k. This 

transmission is generally subject to additive noise and possibly quantization errors. Likewise, steps © 
and (0 involve the sharing of other pieces of information with node k. These exchange steps can all 
be subject to perturbations (such as additive noise and quantization errors). One of the objectives of 
this work is to analyze the aggregate effect of these perturbations on general diffusion strategies of the 
type ©-(O and to propose choices for the combination weights in order to enhance the mean-square 
performance of the network in the presence of these disturbances. 
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Fig. 1. Several additive noise sources perturb the exchange of information from node / to node k. 



So let us examine what happens when information is exchanged over links with additive noise. We 
model the data received by node k from its neighbor I as 



w ik i_l = Wi, 



-i + v \kLi 



1>, 



IkA 



dik(i) = di{i) +v\f{i) 



A (u) 

uik,i = m,i + v\ k \ 



(23) 
(24) 
(25) 
(26) 



where v^\_ x and are Mxl noise signals, is a 1 x M noise signal, and v\^' (i) is a scalar noise 
signal (see Fig. []]). Observe further that in d23l-d26l), we are including several sources of information 
exchange noise. In comparison, references lf33l - |[35l only considered the noise source vi^ -i in (1231 and 
one set of combination coefficients {a^}; the other coefficients were set to c\k = a^fc = for / ^ k 
and Ckk = i2,kk = 1- in other words, these references only considered d23l and the following traditional 
CTA strategy without exchange of the data {di(i),u^i} — compare with ©; note that the second step 
in (|27T ) only uses {dk(i),Uki}: 



,W0 



,(«) 



,(<*)/ 



>fc,i-l 



E 



aiikWi,i-i 



ieM k (27) 
W>fc,i = + VkUk,i[dk(i) ~ Uk,i<l>k,i-l] 

The analysis that follows examines the aggregate effect of all four noise sources appearing in (T23l-(l26l). 
in addition to the three sets of combination coefficients appearing in ©-(O. We introduce the following 
assumption on the statistical properties of the measurement data and noise signals. 



June 4, 2012 



DRAFT 



8 



Assumption 2.1 (Statistical properties of the variables): 

1) The regression data Uk,i are temporally white and spatially independent random variables with zero 
mean and covariance matrix R Uj k =Eix^ i u k,i > 0. 

2) The noise signals vj.(i), ^ikl-v v ik(^)' v ik\' and v lti are temporally white and spatially inde- 
pendent random variables with zero mean and (co)variances a 2 k , R^i k , cr% lk , R^} k , and R^i k , 
respectively. In addition, R ™i k , o 2 vlk , R^} k , and R^} k are all zero if I £ Mk or I = k. 

3) The regression data {um^ }, the model noise signals {v n (i2)}, and the link noise signals {v^ ki ■ }, 
{ v ifk 2 (h)}, { v i3k 3 ,j 3 }> and { v itlL,jJ are mutually-independent random variables for all {h,i 2 ,ji, h,j3,34} 
and {m,n, h,h,h,h, ^1,^2,^3, £4}. ■ 

Using the perturbed data (I23l-(l26l). the diffusion algorithm (TS])-© becomes 

<f>k,i-i = y~] ai,lkWlk,i-l (28) 
leM k 

^k,i = 4>k,i~i+^k ciku* kd [dik (i) - uik,j<t>k,i-i] (29) 
ieAf k 

w k,i=^2 a 2 ,ikil>ik,i (30) 
leN k 

where we continue to use the symbols {4>k,i-i,tpk,i,Wk,i} to avoid an explosion of notation. From (l23l) 
and (l24l) . expressions (l28T)-(l30b can be rewritten as 

4>k,i~i= ^2 aijkWij-i+v^! (31) 

l€Af k 

w k ,i = ^2 a ^ik^i,i+v^} (33) 
ieM k 

where we are introducing the symbols Vu")-, and v k J to denote the aggregate M x 1 zero-mean noise 



signals defined over the neighborhood of node k: 



(w) A \ ^ (ui) /-> A\ 

v k/-i= 2^ a hikVi k ,Li ( 34 ) 
ieAf k \{k} 

v k,i ~ 2^ a ^ kV lk,i (35) 

l€Af k \{k} 
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JeAf fc \{fc} 

^w,* - 2^ a 2,lk K v,lk <. 3/ ) 

JeAT fc \{fc} 

It is worth noting that R^ and depend on the combination coefficients {ayfc} and {a2,;fc}> 

respectively. This property will be taken into account when optimizing over {a± : ik} and {a2,ik} in a 
later section. We further introduce the following scalar zero-mean noise signal: 

for I € A/fc\{fc}, whose variance is 

2 A 2 2 i o* 7->( M ) o 

°lk = a v,l + o-«,ifc + W KJk W ( 39 ) 

To unify the notation, we define v^i) = v k {i). Then, from (fl]), d25l ), and (1261 ), it is easy to verify that 
the noisy data {dik(i),uik t i} are related via 

dik(i) = u lkji w° + vi k (i) (40) 

for I 6 A/fc. Continuing with the adaptation step (l32l and substituting (l40l . we get 

= <t>k,i-i + Hk E c ikU*i kji [uik,i4>k,i-i + vik(i)] (41) 

Then, we can derive the following error recursion for node A; (compare with (fT2l ): 

= {hi ~ VkR'k,i) <f>k,i-i ~ Hk*k,i (42) 
where the M x M matrix R' k i and the Mxl vector z^j are defined as (compare with (1131 ) and (fT4l): 

fl M - E c lkU*k,i u lk,i (43) 

= E c ik u lk,i v ik(i) (44) 
We further introduce the block vectors and matrices: 

T^diagji^,...,^} (45) 

Zi = col{zi,i,. . . ,zjv,J (46) 

^^col^g),...,^]} (47) 

«W4ool{«W (48) 
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and the corresponding co variance matrices for and v\ . 



^^diagj^,...,^} (49) 

^W^diag {Eg?,..., (50) 

then, from (l3Tb . (l33T ). and (|42~1 ). we arrive at the following recursion for the network weight error vector 
in the presence of noisy information exchange: 



A 



(iNM-Mn^i-t-MZi 



-v 



Al (iNM-MTL^Ajwi^-v^-Mz, 



,W>) 



That is, 



Wi = AlilNM-Mn'^Alwi-i - AlilNM-Mn^vti-AjMzi-vl 



(51) 



(52) 



Compared to the previous error recursion (1211) . the noise terms in (1521 ) consist of three parts: 

• A~l (Inm — MRfj) vjj^l is contributed by the noise introduced at the information-exchange step 
(|28T ) before adaptation. 

• A\M.Zi is contributed by the noise introduced at the adaptation step d29l ). 

• is contributed by the noise introduced at the information-exchange step (l30l) after adaptation. 

III. Convergence in the Mean with a Bias 



Given the weight error recursion (1521) . we are now ready to study the mean convergence condition 
for the diffusion strategy (|28T)-(|3"01 in the presence of disturbances during information exchange under 
Assumption 12 - 1 1 Taking expectations of both sides of d52l) , we get 



where 



Ewi = BEw^ - A] (I NM - MR!) ■ E«J"{ - A\M ■ Ez,i - Ev. 

B±Aj {I NM - MR!) Aj 

R! = WR! i = diag {R[, . . . , R' N } 

R' k 4 ER' kji = £ c lk (R U>1 + R { :} k 
ieAf k 

From (O, ([35]), d47]), and (|48]), it can be verified that 



.WO 



Ev. 



(w) 
i-1 



Ev. 



(53) 

(54) 
(55) 
(56) 

(57) 
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whereas, from (l44l and Assumption 12.11 we get 



Ez 



k,i 



E 



^2 cik{ui ti +vlfy* (vi(i)+v 



Ik 1 



- v lk i W ) 




(58) 



Let us define an NM x NM matrix 1Z^} that collects all covariance matrices {R^)u}, k, I = 1, . . . , N, 
weighted by the corresponding combination coefficients {q&}, such that its (fc, Z)th M x M submatrix 

for all k. Then, from 
461 ) and (158T ). we arrive at 



is cikR v A.. Note that 1Z V ^ itself is nof a covariance matrix because c^R kk 



z = Kzj 



Therefore, using d57l ) and d59l ), expression 



: -7^(1^0 

becomes 



E«5,- = B • EtDi_i - AJMz 



(59) 



(60) 



with a driving term due to the presence of z. This driving term would disappear from (1601 if there were 
no noise during the exchange of the regression data. To guarantee convergence of (l60l . the coefficient 
matrix B must be stable, i.e., p{B) < 1. Since Aj and Aj are right-stochastic matrices, it can be shown 
that the matrix B is stable whenever Inm — M1Z' itself is stable (see Appendix lAl. This fact leads to an 
upper bound on the step-sizes {fi k } to guarantee the convergence of E-u3j to a steady-state value, namely, 
we must have 

2 

(61) 



for k = 1,2, . . . , N, where A max (-) denotes the largest eigenvalue of its matrix argument. Note that the 
neighborhood covariance matrix R' k in (l56l ) is related to the combination weights {c/fe}. If we further 
assume that C is doubly-stochastic, i.e., 



then, by Jensen's inequality ||4"8l , 

Amax(-^fc) — ^max 



C T 1 



N 



1 



/j Clk{Ru,l 



N 



R 



(62) 



(«) i 



i: 



< max A max R u i + R y 



R u ,i + R$ k 



,(«) 



(63) 
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since (i) A max (-) coincides with the induced 2-norm for any positive semi-definite Hermitian matrix; (ii) 
matrix norms are convex functions of their arguments R9l : and (iii) by (l62l . {cik} are convex combination 
coefficients. Thus, we obtain a sufficient condition for the convergence of (f60l > in lieu of doTT ): 



A, 



RU,1 + Rylk 



(64) 



for k = 1, 2, . . . , N, where the upper bound for the step-size fj,^ becomes independent of the combination 
weights {c/fc}. This bound can be determined solely from knowledge of the covariances of the regression 
data and the associated noise signals that are accessible to node k. It is worth noting that for traditional 
diffusion algorithms where information is perfectly exchanged, condition d64b reduces to 

2 



(65) 



max; eA/ ; [X ma , x {R u ,i)] 

for k = 1,2, ... ,N. Comparing (l64l with (l65l) . we see that the link noise vju\ over regression data 
reduces the dynamic range of the step-sizes for mean stability. Now, under (|6"TT) . and taking the limit of 
(l60l) as i — > oo, we find that the mean error vector will converge to a steady-state value g: 

g = lim Ewi = - (I NM - By 1 AjMz (66) 



IV. Mean-Square Convergence Analysis 

It is well-known that studying the mean-square convergence of a single adaptive filter is a challenging 
task, since adaptive filters are nonlinear, time-variant, and stochastic systems. When a network of adaptive 
nodes is considered, the complexity of the analysis is compounded because the nodes now influence 
each other's behavior. In order to make the performance analysis more tractable, we rely on the energy 
conservation approach 11321 . ll50l . which was used successfully in |9j, ifTTl to study the mean-square 
performance of diffusion strategies under perfect information exchange conditions. That argument allows 
us to derive expressions for the mean-square-deviation (MSD) and the excess-mean-square-error (EMSE) 
of the network by analyzing how energy (measured in terms of error variances) flows through the nodes. 

From recursion d52l and under Assumption 12.11 we can obtain the following weighted variance relation 
for the global error vector Wi. 



E||tDi||| =E||tSi_i|||, +E||A[A4;Zi||! -2Re{E[ztMA 2 ZAl(lNM-Mn' i )Ajw i -i]} 



+ E\\Aj(I NM -M^M-\\\l+n^ ) \\l 



(67) 
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where S is an arbitrary NM x NM positive semi-definite Hermitian matrix that we are free to choose. 
Moreover, the notation ||x||| stands for the quadratic term x*Y,x. The weighting matrix X' in (I67T ) can 
be expressed as 

£' = B*HB + 0{M 2 ) (68) 

where B is given by (l54l ) and 0(A4 2 ) denotes a term on the order of M 2 . Evaluating the term 0(Ai 2 ) 
requires knowledge of higher-order statistics of the regression data and link noises, which are not available 
under current assumptions. However, this term becomes negligible if we introduce a small step-size 
assumption. 

Assumption 4.1 (Small step-sizes): The step-sizes are sufficiently small, i.e., /x& *C 1, such that terms 
depending on higher-order powers of the step-sizes can be ignored. ■ 

Hence, in the sequel we use the approximation: 

£' « B*ZB (69) 

Observe that on the right-hand side (RHS) of relation (l67l) . only the first and third terms relate to the 
error vector By Assumption 12.11 the error vector Wi-i is independent of and 7?--. Thus, from 

(l59l) . the third term on RHS of (I67T ) can be expressed as 

Third term on RHS of © = -2Re{E[z*7W^4 2 £^J(/7VM-A^^)^i r ] ' EtOf-i} 

= -2 Re(z*A^^ 2 £^J^7 • Exui_i) + 0{M 2 ) (70) 

Since we already showed in the previous section that Kwi converges to a fixed bias g, quantity (PTOl) will 
converge to a fixed value as well when i — > oo. Moreover, under Assumption 12.11 the second, fourth, 
and fifth terms on RHS of relation (l67l) are all fixed values. Therefore, the convergence of relation (l67l) 
depends on the behavior of the first term E||i5j_i|||y. Although the weighting matrix £' of Wi-\ is 
different from the weighting matrix S of Wi, it turns out that the entries of these two matrices are 
approximately related by a linear equation shown ahead in (|72~1) . Introduce the vector notation |[32l : 



cj = vec(S), cr' = vec(S') (71) 
Then, by using the identity vec(ABC) = (C T ® A) • vec(B), it can be verified from d69l that 

a' ss J 7 • a (72) 
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where the N 2 M 2 x N 2 M 2 matrix T is given by 



T = B T <8> B* 



(73) 



To guarantee mean-square convergence of the algorithm, the step-sizes should be sufficiently small and 
selected to ensure that the matrix T is stable (32), i.e., p(T) < 1, which is equivalent to the earlier 
condition p(B) < 1. Although more specific conditions for mean-square stability can be determined 
without Assumption 14.11 [321, it is sufficient for our purposes here to conclude that the diffusion strategy 
(|28l)-(l30l) is stable in the mean and mean-square senses if the step-sizes {pt} satisfy doTT ) or d64l and 
are sufficiently small. 



The conclusion so far is that sufficiently small step-sizes ensure convergence of the diffusion strategy 
(|28l)-(l30l) in the mean and mean-square senses, even in the presence of exchange noises over the 
communication links. Let us now determine expressions for the error variances in steady-state. We start 
from the weighted variance relation (I67T ). In view of (ITUl) . it shows that the error variance E||tWj||j3 depends 
on the mean error EtSj. We already determined the value of linij^ooEmj in (l66l ). 

A. Steady-State Variance Relation 

We continue to use the vector notation (1711 ) and proceed to evaluate all the terms, except the first one, 
on RHS of (I67T ) in the following. For the second term, it can be expressed as 

Second term on RHS of J67T) = Tr(A] MK Z MA 2 X) 



where we used the identity Tr(W£) = [vec(W)]*<r for any Hermitian matrix W, and Tt z denotes the 
autocorrelation matrix of Zj. It is shown in Appendix |B1 that TZ Z is given by 



where C is defined in d22l ). z is in d59l ), and {S, T} are two NM x NM positive semi-definite block 
diagonal matrices: 



V. Steady-State Performance Analysis 



vec(AjMTZ z MA 2 ) <J 



(74) 



K Z = E Zi z* « C T SC + T + zz* 



(75) 



S = diag {a 2 tl R u>1 , a^ N R U)N } 
T = diag {Ti, . . .,T N } 



(76) 



(77) 



(78) 
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From expression (1701 and Assumption 14.11 the third term on RHS of (l67l) is given by 

Third term on RHS of © ~ -z*A^ 2 IMj^7( E, ">i-i) - (EiS^i)* AiA 2 ^Aj Mz 

= -Tv{[AjAj(Ew i ^ 1 )z*MA2+AjMz(Ew i - 1 )*AiA2]^} 
= - vec (MAj(^Wi-i)z*MA2 + AlMzC&Wi^)* AiA 2 
Likewise, the fourth term on RHS of (l67l ) is approximated by 



a (79) 



Fourth term on RHS of d67 



vec 



vec(A 2 v n^A 2 ] 



(80) 



where we are now ignoring terms on the order of M and M 2 . The fifth term on RHS of (I67T ) is given 
by 

vec(^)l 



Fifth term on RHS of d67 



a 



Let us introduce 



K v = AllZ ( ^ ) A 2 + + AlM{T + zz*)MA 2 



y A -AjAjgz*MA 2 



= A}Al (Inm A 2 , Mzz*MA 2 
At steady-state, as i — > 00, by d66l ) and d74l-(l83l). the weighted variance relation (I67T ) becomes 



limE||«5i||2 « lim EHu^H^ + [vec(^jA^C T 50[^ 2 +7^, + 3> + ^* 



cr 



(81) 



(82) 



(83) 



(84) 



where we are using the compact notation ||x||^ to refer to \\x\\^ — doing so allows us to represent £' 
by the more compact relation Fa on RHS of (l84l) : we shall be using the weighting matrix £ and its 
vector representation a interchangeably for ease of notation (likewise, for £' and a 1 ). The steady-state 
weighted variance relation (l84l) can be rewritten as 



lim ElltBjl 



-T)<T 



vec(A 2 r MC T SCMA2 + TZ v + y + y* 



a 



(85) 



where the term A\ MC J SCM.A2 is contributed by the model noise {vk{i)} while the remaining terms 
{TZv,y} are contributed by the link noises {v^l_ l , v$(i), vj£ \, v\^\}. Recall that we are free to choose 
£ and, hence, a. Let (In 2 m 2 — J 7 )^ = vec(O), where Q is another arbitrary positive semi-definite 
Hermitian matrix. Then, we arrive at the following theorem. 
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Theorem 5.1 (Steady-state weighted variance relation): Under Assumptions 12. 1 1 and 14. 1 [ for any pos- 
itive semi-definite Hermitian matrix Q, the steady-state weighted error variance relation of the diffusion 
strategy (|28l)-(|30l) is approximately given by 



lim E||«7j||^ 



vac{A T 2 MC T SCMA 2 +Tl v +y +y*)} (I N 2 M 2 - .Fj-HecCfi) 



(86) 



where S is given in (76]>, K v in $$2$, y in ([83]), and T in ([73]). 



B. Network MSD and EMSE 

Each subvector of Wi corresponds to the estimation error at a particular node, say, Wk,i for node k. 
The network MSD is defined as (321: 



1 N 

MSD = lim — VEll^fc 



fc=i 



Since we are free to choose 0, we select it as Q = Inm/N. Then, expression (l86l ) gives 



MSD « i- 

N 



vec{A T 2 MC T SCMA 2 +n v +y +y*)} (I N 2 M 2 -F)- X vec(l 



<-NM ) 



Similarly, if we instead select = 1Z U /N, where 

TZ U = diag {-R^i, . . . , R u ,n} 
then expression (l86l ) would allow us to evaluate the network EMSE as: 



EMSE; 



1 r 

N 



vec(AjMC T SCMA 2 +n v +y+y*) (I N 2 M 2 -F)~ l vec(Jl u 



where, under Assumption 12.11 the network EMSE is given by 



1 - 

EMSE = lim - VE|« M «J fc ,i_ 1 



fc=l 

JV 



.— >oo iv z — ' 



k=l 



(87) 



(88) 



(89) 



(90) 



(91) 



C. Simplifications when Regression Data are not Shared 

We showed in the earlier sections that the link noise over regression data biases the weight estimators. 
In this section we examine how the results simplify when there is no sharing of regression data among 
the nodes. 

Assumption 5.2 (No sharing of regression data): Nodes do not share regression data within neighbor- 
hoods, i.e., assume C = In- ■ 
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By Assumptions |4~T1 and I5T21 matrices {B,TZ v ,y} in (|54"1) . (f82l) . and ([83]) become 

B = Aj(I NM - M1Z U )AJ (92) 
U v = A!k^A 2 + 72^ (93) 
y = (94) 
where 72 u is given in (l89l) . Then, the network MSD and EMSE expressions (I88T ) and (l90l ) simplify to: 

(95) 



and 



MSD « 4 

TV L 


vec(A]MSMA 2 + 7^) 


(In*m* - 7") 1 vec (I N M ) 




1 

EMSE sa — 

tv 




- * , 

(I N 2 M 2 - F) vec(Tl u ) 



(96) 



D. Dependence of Performance on Combination Weights and Link Noise 

Recalling that 1Z V and T are related to the combination matrices {Ai, A2}, or, equivalently, {At, A2}, 
results (|95T ) and (l96l ) express the network MSD and EMSE in terms of {A\, A 2 }. However, it is generally 
difficult to use these expressions to optimize over {Ai, A 2 } to reduce the impact of link noise. Instead, by 
substituting (1731 into (l95l ) and using the fact that T is stable, we can arrive at another useful expression 
for the network MSD: 



MSD pa — f 

N 



vec 



{AlMSMA 2 +n v )V^Pvec(I NM ) 



3=0 



l r 



TV 
1 

TV 



vec{A T 2 MSMA 2 +K v )\ J](i3 T <8> B*) j vec(I NM ) 

3=0 

00 

vec(AjMSMA 2 +n v )\ * ^ vec(B* j B j ) 

3=0 



That is, 



where B is given in d92| ). Similarly, the network EMSE can be expressed as 




(97) 



(98) 




(99) 
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Expressions (1981 ) and d99l reveal in an interesting way how the noise sources originating from any 
particular node end up influencing the overall network performance. Let us denote 

Bi 4 A T 2 (I NM - M-RfiAj (100) 
0i 4 Aj{I NM - MK'^vtl + AjMzi + vf ] (101) 
The eiTor recursion (l52l can be rewritten as 



Wi = BiWi-i - 6i 



(102) 



m=0 



where 



Bi&i-i . . . B m , i > m 



(103) 



' NM 5 



i < m 



Then, 



m=0 



E||i«i|| 2 = E||* ,i™-i|| 2 +E 

Under Assumption 15.21 {Bj, #j} in (1 1001 ) and (1 1 1 b can be simplified as 

= Aj(I NM - M-Ki)A\ 
6 t = A J 2 {I NM - MTZijv^l + A J 2 Ms t + vf 



(104) 

(105) 
(106) 



where {Hi, s^} are given in ( fTSl ) and (fl6l ). By Assumption 12.11 {Bi, 0i} are temporally independent for 
different i and 



E Bi = B, E 0i = 
where 6 is given by d92j ). As i — > oo, the first term on RHS of d 104b becomes 

First term on RHS of (fT04l) = lim Tr {E [*o,i(E«5-i«5*i)*o,i] } 



(107) 



(a) 



lim Tr [(E* 0> i) (Eu5_i«)* x ) (E* 



OA, 



lim Tr 

i— >oo 



^(EuLiw*!)^** 1 )* 



(6) 



(108) 
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where (a) is obtained by approximating the expectation of the product by the product of expectations 
and (b) is due to the stability of B. Therefore, the steady-state value of (1104b gives 



lim E\\ Wi \\ 2 = lim V E ||* m+ i.i0 m || 2 

i—^oo i— >oo ^— ' 
m=0 

i 

« lim V Tr [(E* m+ i,i) (E6> m 6>^) (E* m+1 

j— >oo z — ' 

m=0 

i 

lim V Tr f^- m (^JX5X^ 2 + 7Zt,)B 



(a) 

ss nm 

i— >oc 

m=0 

i 

( = ] lim V Tr \B j {jJ 2 MSMA 2 + IZ v )B j * 

i—too z — ' L 
3=0 

oo 

= Tr [s J (4^5Mi 2 + Tl v )B* j 

3=0 

where, by d93l and (11061 ). (a) is due to 



(109) 



m m 0* m « ^J(/jvm - MTZ u )TZ^"\l NM - MK U )A 2 + AjMSMA 2 + 

= Aj-MSA^ + ^ (HO) 

and (b) is simply a change of variable: j = i — m. Since the jth term of the summation in (|98l ) or (11091 ) 
is contributed by the term K0i^j6*_j, which consists of all the noise sources at time i — j, expression 
(|98T ) shows how various sources of noises are involved and how they contribute to the network MSD. 

VI. Optimizing the Combination Matrices 

Before we optimize the combination matrices {Ai, A 2 }, we first specialize the MSD expression (|98l ) 
and the EMSE expression d99l for the ATC and CTA algorithms. For the ATC algorithm, we set A\ = In 
and A 2 = A, and for the CTA algorithm, we set A\ = A and A 2 = In- Let us denote 

A = A®I M (HI) 
£atc = A T (I NM - MTZ U ) (112) 
B cta = {lNM-MTZ u )A T (113) 
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Then, we get 



2*3 
■'ate 



MSD atc « Tr [^M^SA^+T^)^ 

i=o 

1 00 

EMSE atc « Tr ^(^M^Ai^+^W)^:^ 



and 



?*./ 

-"eta 



MSD cta « 1 J] Tr [^(.MSAl + T^S, 

3=0 

EMSE cta « - Tr [^cta(^^M + K^BliKu 



(114) 
(115) 

(116) 
(117) 



A. An £/p/j<?r Bound for MSD 

Minimizing the MSD expression (II 141 ) or the EMSE expression (II 151 ) for the ATC algorithm over left- 
stochastic matrices A is generally nontrivial. We pursue an approximate solution that relies on optimizing 
an upper bound and performs well in practice. Let us use to denote the nuclear norm (also known 

as the trace norm, or the Ky Fan n-norm) of matrix X II5TI . which is defined as the sum of the singular 
values of X. Therefore, \\X\\* = ||-X"*||* for any X and \\X\\* = Tr(X) when X is Hermitian and positive 
semi-definite. Let us also denote ||X||b )00 as the block maximum norm of matrix X (see Appendix [At. 
Then, 



Tr 



Bi c {A J MSMA + Ti^)Bli 



Bi tc {A T MSMA + TZ^)B*J C 
< Ah ■ \\A T MSMA + n^\\* • ||i3* J c | 



\R j II 2 

r-'atc llb,oo 

IB \\ 2j 



Tr(A T MSMA + 1lW] 



< c 2 ■ (WAU, 



LNM 



Mll u \\ b)00 ) 2j TT{A J MSMA + K^) 



c 2 ■ p(Inm - MK u ) 2j ■ Tr{A T MSMA + K^) 



(118) 



where c is some positive scalar such that \\X\\* < c||X||& |00 because and ||X||& )00 are submulti- 

plicative norms and all such norms are equivalent ||49l . In the last step of (II 181 ) we used Lemmas IA.4I 
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and IA.5I from Appendix |A] Thus, we can upper bound the network MSD (II 141 ) by 



MSD atc < 



i 00 

-Y 



c 2 • p(Inm 



MR u ) 2j Ti(A T MSMA + TZ^) 



W0> 



_ c2 ^(^ t jM5M^ + *RX 
~N' 1 - [p(I NM - MK u )} 2 

where the combination matrix A appears only in the numerator. 



(119) 



B. Minimizing the Upper Bound 

The result dl 191 ) motivates us to consider instead the problem of minimizing the upper bound, namely, 

minimize Tr (A T MSMA + ) 

subject to A T l = t, ai k >0, ai k = if I £ M k 
Using d50l) and d76l ), the cost function in d 120b can be expressed as 



(120) 



N 



Tr(A T MSMA + = £ £ 4 [M^R^) + Tr(R^ k ) 

k=i ieM h 

Problem (11201 ) can therefore be decoupled into N separate optimization problems of the form: 



(121) 



minimize V af k p 2 a 2 v ^t(R u j) + Tr(R$ k ) 



subject to a Zfe = 1, a/ fe > 0, = if I g M k 

leAf k 



(122) 



for = 1,...,N. With each node / € A4, we associate the following nonnegative variance product 
measure: 

p 2 a 2 vk Tr(R u , k ), l = k 

(123) 

pfa^TriRu^+TriR^l), leAf k \{k} 

This measure incorporates information about the link noise covariances {nH$ k }. The solution of (fi22l 
is then given by 



2 A 

7«fc = 



Z-/m€jVk Irak 



0. 



if i e A4 



otherwise 



(relative variance rule) 



(124) 



We refer to this combination rule as the relative variance combination rule; it is an extension of the rule 
devised in ||52l to the case of noisy information exchanges. In particular, the definition of the scalars 
{TjI} in dl23l ) is different and now depends on both subscripts I and k. 
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Minimizing the EMSE expression (II 151 ) for the ATC algorithm over left-stochastic matrices A can be 
pursued in a similar manner by noting that 

Bi tc (A T MSMA+n^)B:iTZ u ] < c 2 [p{I NM -MTZ u )] 2j Tv(A J MSMA+K^) Tt(K u ) (125) 



Tr 



Thus, minimizing the upper bound of the network EMSE leads to the same solution (| 124b . Using the 
same argument, we can also show that the same result minimizes the upper bound of the network MSD 
or EMSE for the CTA algorithm. 



C. Adaptive Combination Rule 

To apply the relative variance combination rule (11241 ). each node k needs to know the variance products, 
{jf k }, of their neighbors, which in general are not available since they require knowledge of the quantities 
{a 2 l ,Tr(R Ui i),Tr(R^) k )}. Therefore, we now propose an adaptive combination rule by using data that 
are available to the individual nodes. For the ATC algorithm, we first note from (1241 and (|29l ) that 



®Wlk,i - toy-ill 2 « /i?<zTV(# nj/ ) + Tr(R, 



Jk> 



Ilk 



(126) 



for I € Mk\{k}. Since the algorithm converges in the mean and mean-square senses under Assumption 
14. 1[ all the estimates {w^i} tend close to w° as i — > 00. This allows us to estimate 7?. for node k by 



using instantaneous realizations of \\ipik,i — ^fc,j-i|| 2 > where we replace by w^^-i- Similarly, for 

node k itself, we can use realizations of \\ipk,i ~ ""^fc.i-ill 2 to estimate 7^. To unify the notation, we 
define Vfefe i — V'/fci- Let ~ff k {i) denote an estimator for jf k that is computed by node k at time i. Then, 
one way to evaluate 7f fc (i) is through the recursion: 



7fLW = (1 - Vk)l? k (i - !) + ^HV^i - vj k ,i-i\ 



(127) 



for I € TVfe, where Uf. € (0, 1) is a forgetting factor that is usually close to one. In this way, we arrive at 
the adaptive combination rule: 



[7 



0. 



if / G Mk 
otherwise 



(128) 
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VII. Mean-Square Tracking Behavior 

The diffusion strategy ©-([7]) is adaptive in nature. One of the main benefits of adaptation (by using 
constant step-sizes) is that it endows networks with tracking abilities when the underlying weight vector 
w° varies with time. In this section we analyze how well an adaptive network is able to track variations in 
w°. To do so, we adopt a random-walk model for w° that is commonly used in the literature to describe 
the non-stationarity of the weight vector |32l . 

Assumption 7.1 (Random-walk model): The weight vector w° changes according to the model: 

to? = <_! + r?i (129) 

where {w°} has a constant mean w° for all i, {rji} is an i.i.d. random sequence with zero mean and 
covariance matrix R r/ ; the sequence {r]i} is independent of the initial conditions {w°_ l5 io^-i} and of 
all regression data and noise signals across the network for all time instants. ■ 

We now define the error vector at node k as 

Wk,i - w ° ~ w k,i (130) 
so that the global error recursion (l52l for the network is replaced by 

Wi = AjilNM-MK^Ajwi-i+AJilNM-Mn'JAjti 

- AJ(Inm - MK'^vtl - AjMzi - vf } (131) 
where the NM x 1 vector Ci is defined as 

C, = col {rji, . . . ,rji} = t N ®rii (132) 

A. Convergence Conditions 

By Assumptions 12.11 and 17.11 it can be verified that the condition for mean convergence continues to 
be p (£>) < 1, where B is defined in (l54l ). In addition, it can also be verified that the error recursion 
(11311 ) converges in the mean sense to the same non-zero bias vector g as in (|66l ). From d 1 3 1 b and under 
Assumption 14.11 we can derive the weighted variance relation: 

E||wi||2 wEHtOi-ill^ +E\\Aj(I N M-MK' i )Ajti\\l 

-2Re{E[z*MA 2 ^Aj(I N M-MTL' i )Alwi^i]} 

+ E\\Aj(I NM - M^v^lWl 

+ E\\AlMz i \\l + E\\v^ ) \\l (133) 
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where F is given in (|73T ). If the step-sizes are sufficiently small, then we can assume that the network 
continues to be mean-square stable. 



B. Steady-State Performance 

The steady-state performance is affected by the non-stationarity of w°. From Assumption 14. 1[ at steady- 
state, expression d 133b becomes 

lim E||^||^«[vec(^jA4C T SCA^ 2 +^I^ (134) 

i— >oo 

where S is given in (l76l) . R v in (l82l) . y in (l83l) . T in (l73l) . and 72.^ is the co variance matrix of 

K c 4 EOC = ® ^ (135) 

By ©, d22j, and (fl35l we get 

AjAjTZ^A^ = (AjAjtNlJfAiAz) 
= (t N t T N ) ® Rr, 
= K C 



R„ 



(136) 



Then, following the same argument that led to (1881 ). we find that the network MSD is now given by: 



MSD tlk ^^[vec(A]MC T SCMA 2 +n t: +7i v +y+y*)}*{I N 2 M 2 - ^"Vec^TVA/) 



(137) 



Similarly, the network EMSE is given by: 



EMSE trk ^^[vec(AjMC T SCMA 2 +n (: +TZ v +y+y*)]*(I N 2 M 2 - T)-\qc{K u ) 



(138) 

where 1Z U is defined in ([89b - Observe that the main difference relative to (1881 ) and (1901 is the addition 
of the term TZq. Therefore, all the results that were derived in the earlier section, such as d95l ) and d96l ), 
continue to hold by adding TZ^. In particular, if Assumptions 14. II and 15.21 are adopted, expressions (1137b 
and (11381 ) can be approximated as 



MSD trk « ^[vec(A 2 v MSMA 2 + TZ C + K v )]*(I n *m* ~ 7")~ 1 vec(/ iV M) 



and 



EMSE trk ps ^[wec{AlMSMA 2 + U C + K V )]*{I N * M * - T)~\ec{Tl u ) 



(139) 



(140) 



where TZ V is now given in d93l . 
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Fig. 2. A network topology with N = 20 nodes. 

VIII. Simulation Results 

We simulate two scenarios: noisy information exchanges and non-stationary environments. We consider 
a connected network with N = 20 nodes. The network topology is shown in Fig. [2] 

A. Imperfect Information Exchange 

The unknown complex parameter w° of length M = 2 is randomly generated; its value is [0.3750 + 
j2. 0834, 0.7174 + j'1.4123]. We adopt uniform step-sizes, {^k = 0.01}, and uniformly white Gaussian 
regression data with covariance matrices {R u ^ = a uk^M}, where {o~^ k } are shown in Fig. [3a] The 
variances of the model noises, {a 2 k }, are randomly generated and shown in Fig. [3b] We also use white 
Gaussian link noise signals such that R^ k = a 2 w lk I M , R$ k = o^ lk I M , and flgj. = o\ lk I M - All link 
noise variances, Wwik> a vik> a uiki a ijjik}> are randomly generated and illustrated in Fig. [4] from top to 
bottom. We assign the link number by the following procedure. We denote the link from node / to node 
k as iifi, where / ^ k. Then, we collect the links {£i k, I £ in an ascending order of I in the list 

£k (which is a set with ordered elements) for each node k. For example, for node k = 2 in Fig. [2] it has 
6 links; the ordered links are then collected in £2 = {£5,2 > ^6,2, £7,2, ^13,2, ^15,2 , ^20,2}- We concatenate 
{£k} in an ascending order of k to get the overall list £ = {£\,£2, ■ ■ ■ , £n}- Eventually, the mth link 
in the network is given by the mth element in the list £. 

We examine the simplified CTA and ATC algorithms in ([3]) and ([4]), namely, no sharing of data among 
nodes (i.e., C = In), under various combination rules: (i) the relative variance rule in (1124b . (ii) the 
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Metropolis rule in 11391 : 



I € M k \{k} 
I = k 



(141) 



l4N k 



max{|A4|, Wi\}' 

aik = 1 — aik ' 
l&M k \{k} 

aik = 0, 

where |A4| denotes the degree of node k (including the node itself), (iii) the uniform weighting rule: 

1 ... 

4 1 ( 142 ) 

a ik =0, ItMk 



aik 



and (iv) the adaptive rule in (1128b with \v k = 0.05}. We plot the network MSD and EMSE learning 
curves for ATC algorithms in Figs. [5a] and [5c] by averaging over 50 experiments. For CTA algorithms, 
we plot their network MSD and EMSE learning curves in Figs. [5b] and [3d] also by averaging over 50 
experiments. Moreover, we also plot their theoretical results d95l ) and (|96T > in the same figures. From Fig. 
[5] we see that the relative variance rule makes diffusion algorithms achieve the lowest MSD and EMSE 
levels at steady-state, compared to the metropolis and uniform rules as well as the algorithm from 11331 
(which also requires knowledge of the noise variances). In addition, the adaptive rule attains MSD and 
EMSE levels that are only slightly larger than those of the relative variance rule, although, as expected, 
it converges slower due to the additional learning step (11271 ). 



B. Non-stationary Scenario 

The value for each entry of the complex parameter w° = col{w° 1 ,w° 2 } is assumed to be changing 
over time along a circular trajectory in the complex plane, as shown in Fig. [6] The dynamic model for 
w° is expressed as w° m = e? w w°_i m , where m = 1, 2, u = 2-7r/6000, and w°_i = col{l + j, —1 — j}. 
The covariance matrices {R Uj k} are randomly generated such that Ry,k 7^ R u l when k ^ I, but their 
traces are normalized to be one, i.e., Tr(R U: k) = 1, for all nodes. The variances for the model noises, 
{a^ k }, are also randomly generated. We examine two different scenarios: the low noise-level case where 
the average noise variance across the network is —5 dB and the noise variances are shown in Fig. [Ta] and 
the high noise-level case where the average variance is 25 dB and the variances are shown in Fig. [7b] We 
simulate 3000 iterations and average over 20 experiments in Figs. l6al and l6bl for each case. The step-size 
is 0.01 and uniform across the network. For simplicity, we adopt the simplified ATC algorithm where 
C = In, and only use the uniform weighting rule (11421) . The tracking behavior of the network, denoted 
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(a) The variance profile of regression data. 
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(b) The variance profile of measurement noises. 
Fig. 3. The variance profiles for regression data and measurement noises. 
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Fig. 4. The variance profiles for various sources of link noises, including {cr^, ife , cr^ ;fc , <r^ ifc , tT^, ife }. 



as u)j = coljtD^i, Wifi}, is obtained by averaging over all the estimates, {wk,i}, across the network. 
Figs. [6a] and [6b] depict the complex plane; the horizontal axis is the real axis and the vertical axis is the 
imaginary axis. Therefore, for every time i, each entry of w° or Wi represents a point in the plane. When 
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(c) Network EMSE curves for ATC algorithms 
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(d) Network EMSE curves for CTA algorithms 



Fig. 5. Simulated network MSD and EMSE curves and theoretical results 
combination rules under noisy information exchange. 



and d96t for diffusion algorithms with various 



i is increasing, moves along the red trajectory (in o), w° 2 along the blue trajectory (in □), u>i i 
along the green trajectory (in +), and tD^ along the magenta trajectory (in x). From Fig. [6l it can be 
seen that diffusion algorithms exhibit the tracking ability in both high and low noise-level environments. 



IX. CONCLUSIONS 

In this work we investigated the performance of diffusion algorithms under several sources of noise 
during information exchange and under non-stationary environments. We first showed that, on one hand, 
the link noise over the regression data biases the estimators and deteriorates the conditions for mean and 
mean-square convergence. On the other hand, diffusion strategies can still stabilize the mean and mean- 
square convergence of the network with noisy information exchange. We derived analytical expressions 
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for the network MSD and EMSE and used these expressions to motivate the choice of combination 
weights that help ameliorate the effect of information-exchange noise and improve network performance. 
We also extended the results to the non-stationary scenario where the unknown parameter w° is changing 
over time. Simulation results illustrate the theoretical findings and how well they match with theory. 

Appendix A 
Stability of Aj (I N m - MR!) Aj 
Following lfl"5l . we first define the block maximum norm of a vector. 
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(a) The variance profile for low noise-level. 
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(b) The variance profile for high noise-level. 
Fig. 7. The noise variance profiles for two cases. 



Definition A.l (Block Maximum Norm): Given a vector x = col{xi, ... ,xn} G £. mn consisting of 
N blocks {x k G C A/ , A; = 1, . . . , N}, the block maximum norm is the real function || • ||b j00 : C — > R, 
defined as 

\\x\\b,oo - max \\xkh (143) 

l<fc<AT 

where || • 1 1 3 denotes the standard 2-norm on C . ■ 

Similarly, we define the matrix norm that is induced by the block maximum norm as follows: 

Definition A.2 (Block Maximum Matrix Norm): Given a block matrix A G QMNxMN w j t j 1 s j ze 
M x M, then 

H^lkoo^ max (144) 

denotes the induced block maximum (matrix) norm on £. MNxMN . ■ 

Lemma A.3: The block maximum matrix norm is block unitary invariant, i.e., given a block diagonal 
unitary matrix U = diag{C/i, . . . , U N } € C MNxMN consisting of N unitary blocks {U k G C MxM , fc = 
1, . . . , N}, where f7 fe Z7* = U%U k = I M , for any matrix A G C M7VxMAr , then 

\\A\\ b>00 = \\UAU*\\ bjOC (145) 

where || • \\ b ]00 denotes the block maximum matrix norm on £. MNxMN with block size M x M. ■ 
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Lemma A.4: Let A G C NxN be a right-stochastic matrix. Then, for block size M x M, 

\\A®I M \\b,oo = 1 (146) 

Proof: From Definition IA.2[ we get 

i * „ t n max; || Y,k=i[ A \lkXkh 

W®/m L= max ft I, . — 

xe c MN \{0} max fc ||x/c|| 2 

maxiY^k=i{ A }ik\\xkh 

< max ^-t- — — 

xeC" N \{0} max fc ||x fc || 2 

< max ma x;(E^=i[^]^) • max fc \\x k \\ 2 
~ xeC MN \{0} max fc ||xfc|| 2 

max; 1 • max/% ||x/J|2 

< max - — - 

j;eC MN \{o} maxfc||xfc||2 

= 1 (147) 

where x = col{xi, . . . , xn} £ C mn consists of N blocks {x k € C , k = 1, . . . , N}, and [A]ik denotes 
the (I, k)th entry of A. On the other hand, for any induced matrix norm, say, the block maximum norm, 
it is always lower bounded by the spectral radius of the matrix |49l : 

\\A ® I M \\b,oo > p(A ® hi) = p{A) = 1 (148) 
Combining (11471 ) and (11481 ) completes the proof. ■ 

Lemma A. 5: Let A € C NMxNM be a block diagonal Hermitian matrix with block size M xM. Then 
the block maximum norm of the matrix A is equal to its spectral radius, i.e., 

Plkco = p{A) (149) 

Proof: Denote the kth M x M submatrix on the diagonal of A by A k . Let Ak = Uk^-kU^ be the 
eigen-decomposition of A k , where Uk £ £. MxM is unitary and G l MxM is diagonal. Define the 
block unitary matrix U = diag{C/i, . . . , [/at} and the diagonal matrix A = diag{Ai, . . . , A^}. Then, 
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A = UKU* . By Lemma |A. 3 1 the block maximum norm of A with block size MxMis 

||^|| 6iOO = ||WAW*||6 |0O 

= IIAIkoo 

max fc ||A fc x fc || 2 
= max — - — 

xec MN \{o} maxk\\xk\\2 

maxfe ||A fe || 2 • \\x k \\2 

— max - — - 

xeC MJV \{o} max fc ||x fc || 2 

max fc ||A fe || 2 • maxfc \\x k \\ 2 

— max - 

xgC mn \{0} maxfc||xfc|| 2 

= max ||Afc|| 2 

k 

= P(A) (150) 

where we used the fact that the induced 2-norm is identical to the spectral radius for Hermitian matrices 
ll49l . On the other hand, any matrix norm is lower bounded by the spectral radius |[49l . i.e., 

P(A) < \\A\\ b>00 (151) 

Combining (11501 ) and (11511) completes the proof. ■ 

Now we show that the matrix AJ(Inm — MH')Aj is stable if Inm — MTZ' is stable. For any induced 
matrix norm, say, the block maximum norm with block size M x M, we have |49l 

f - Mn')Aj) < \\AJ(Inm - MTZ')Aj\\ b ^ 

< ll^z ||6,oo ' \\InM — M1Z'\\b )0 o ■ \\Ax \\b,oo 

= \\Inm - Mn'\\ bj0O (152) 

where, from © and (l22l . ^4^ an d A\ satisfy Lemma [AT4] By (fTTT ) and (I55T ). it is straightforward to see 
that Inm — M1Z' is block diagonal with block size M x M. Then, by Lemma lA.51 expression (11521 ) 
can be further expressed as 

P (aI{I nm - MR!)AX) < p(Inm - MR!) (153) 

which completes the proofQ 

'This statement fixes the argument that appeared in Appendix I of 11 11 and Lemma 2 of 1121 . Since the matrix X in Appendix 
I of 1111 and the matrix M in Lemma 2 of 1121 are block diagonal, the || ■ || p norm used in these references should simply be 
replaced by the || • \\b,oo norm used here and as already done in 11151 . 
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Appendix B 
Proof of expression d75T ) 

Let us denote the (Z, fc)th submatrix of R z by R z ik € C MxM . By Assumptions 12. II and expression 
(1441) . i? 2> ;fc can be evaluated as 

Rz,lk = ^Zl,iZk,i 

= Yl ^^k ^ {Kd,iVrrd(i)Vnki. i ) u nk,i) j ( 154 ) 

= Rml.nk 

where, by expressions (l26l) and (138T ). 

Rml,nk = E^j+D^jj* («m(0 + «S (i ) -t^X) («»(0 + «2 (0 ) ' («»,*+«$,) 

(155) 

When m ^ n, expression (11551 ) reduces to 

Rml,nk = R^ ml w°w°*R^ nk (156) 
When m = n, expression (11551 ) becomes 



™ fu) o o* M* ( (u) \ ( (u) 



« m + 5 lk al ml ) (R u , m + 5 lk R^ ) + 8 lk w *R^ ml w°R u , m + R^ ml w°w°*R^ 



mk 

I r Act „,(«)*„,(«) „,o„„o*„, («)*(«) p(«) ,„o„,o*p(ii) \ n*T\ 

+ d Zfe ^v mM u mM '«; ie v nai v mli -R vml w w R vM J (157) 
where (5^ denotes the Kronecker delta function. Evaluating the last term on RHS of (11571 ) requires 

(u) 

knowledge of the excess kurtosis of v which is generally not available. In order to proceed, we 
invoke a separation principle to approximate it as 

^ V rnl,i V ml,i W W V ml,i V ml,i ~ K v,rnl W W K v,ml ( 15S ) 

Substituting (11581 ) into (11571 ) leads to 

iW~« m +%< m (Ru, m +SikR^ m i)+Sik (w°*R^ ml w°) R u , m +R^ ml w°w°*R^ 



oZ^n+I&w'w^RM+Sik {ol ml +w°*R^>y)R u , m H< m +< ml )Rl 



mk 

2 _l„,,o* E>(") ,„o\D , ( „1 i „2 

ml 



(159) 
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From (11561 ) and dl59l ), we get 



+ 5 mn 5ik 

Substituting (11601 ) into (1154b . we obtain 



mi 



(160) 



\meM J \neNk J m^Ni neNk 



«,m "I" (^o^ "I" ^v,inl)Rv,m: 



(u) 

From (|58l)-(l59l) and (|76l)-(r78T), we arrive at expression ( |75T ). 



(161) 
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